REAL TIME SOUND TO NOTE CONVERTER - AudioToMidi - (Freeware ver.1.01) 1.0 FEATURES 2.0 MINIMUM SYSTEM REQUIREMENTS 3.0 INSTALLATION 4.0 OPERATION PRINCIPLE 5.0 OPERATION 6.0 LICENSE 7.0 CONTACT 1.0 FEATURES The present software allows the conversion of a standard audio signal to MIDI signal with an insignificant delay. The resulting MIDI signal can be given to a standard MIDI device, PC speaker, built-in sequencer. The audio signal spectrum is displayed in a special window in real time. The present software also allows special options: -Input audio signal volume normalization; -Input audio signal correction by built-in graphic equalizer; -Consideration of possible audio signal frequency deviation from the standard note frequencies, e.g., because of the difference between the guitar sound and the tuning fork; -Note sensor selectivity meaning the sensor sensibility of a particular note to the adjacent note signals. The graphic representation of the selectivity is provided. The parameter of this option impacts the conversion delay time. -Note volume filter allowing to ignore low notes and noise; -Output note filter, allowing to ignore the notes of pre-set loftiness when MIDI signal generation. The note interval and/or presumed key can also be set. -Note duration filter allowing to ignore accidental short notes when writing to the built-in sequencer. -Graphic simulation of the first four harmonics of recognized instrument or voice; -MIDI signal transposition. Resulting note loftiness shift at the integer number of semi-tones. -MIDI instrument selection when signal generation; -Resulting note volume selection. -MIDI channel selection when MIDI signal generation. -Monophonic mode which allows the lowest note selection from a number of simultaneous notes, thus providing the separation of the first instrument/voice harmonic, and errorless conversion of monophonic melodies. The general setting is automatically saved. Separate saving, opening and resetting of the equalizer, harmonic model and filter settings are provided. The conversion result is presented in real time by highlighted piano keys. The piano keyboard window can also be used to generate the sound corresponding to the pressed key. The signal spectrum representation allows the software usage for the comfortable guitar tuning. The visual peaks must be symmetrical at the middle position of the “Tune” control slider. The built-in sequencer allows opening and playing MIDI (*.mid) and RIFF MIDI (*.rmi) files. New tracks can be also added to the open files. A new record in the sequencer is made by addition of a new track. Thus, MIDI record can be created from several tracks. The record can be saved in MIDI or RIFF MIDI files. The software provides the selection of the input audio device and the output MIDI device. Due to the selection option of the output MIDI device, AudioToMidi is able to operate with an external software sequencer. The driver Sonic Foundry Virtual MIDI Router (VMR) is recommended for this purpose. The distribution conditions of this file with the description in English are found in the enclosed file Sonic Foundry MIDI Router.wri (the driver package is not included). MIDI signal, both real time and written to the built-in sequencer, can be applied to the external software sequencer. In the latter case all tracks are combined into one track. The audio *.wav, *.mp3, *.au files and the like can be converted by the playback by the appropriate software giving the sound to the audio device selected at the AudioToMidi input (usually Wave Mapper). 2.0 MINIMUM SYSTEM REQUIREMENTS Processor: P75. OS: Windows 95 or Windows NT. Memory: Approximately 1MB free ROM. Hard disk: Approximately 1MB free space. Devices: Any sound card, which is not worse than SB16 in possibility. Drivers and application programs of the sound card must be installed to OS. Note: No sound is given to PC speaker under Windows NT, the appropriate option is disabled. 3.0 INSTALLATION This software product is installed by simple copying of files to any directory preserving the archive catalog tree structure. No additional libraries or drivers are required. The file AudioToMidi.exe runs the software. 4.0 OPERATION PRINCIPLE A continuous sample flow representing a digitized sound from the “Wave In” list device is given to the program input. This signal is given to the massive of sensors; each tuned at a particular frequency. This frequency is equal the frequency of the note associated with the sensor, at some possible correction. The value representing the sound intensity within the domain of the sensor own frequency is generated at the each sensor output. These values are graphically presented in the “Spectrum” window. The note frequency is commonly calculated. Note A of the first octave has the frequency of 440 Hz. When the note is raised or lowered at 1/2 tone, the frequency is multiplied at or divided by the value, equal to the 12-power root of two. Hence, if the note is raised or lowered at 12 semi- tones, i.e., at one octave, the frequency is multiplied at or divided by 2. A of small octave matches 220 Hz, A of large octave - 110 Hz, A of the second octave - 880 Hz, A of the third octave - 1760 Hz, etc. The correction of the sensor frequency is a function of the position of “Tune” slider. At the middle position of the slider the correction is equal to zero. At the most left position the correction makes the frequency match the note frequency, which is 1/2 tone below the note associated with the sensor. At the most right position the frequency matches the frequency of the note which is 1/2 tone above the note associated with the sensor. If the slider is moved smoothly from the most left to the most right position, the frequency of each sensor is also smoothly changed from the lowest to the highest value. Sensor sensitivity individually set by the “Equalizer” control. “Sensitivity” control changes the sensitivity of all sensors simultaneously. The sensor sensitivity is increased when moving the “Equalizer” or “Sensitivity” slider upward. The selectivity of sensors set by the “Selectivity” control is graphically represented in the respective window. The plot shows the sensor selectivity as a function of the audio wave frequency. The middle vertical line corresponds to the sensor own frequency. The adjacent vertical lines correspond to the frequency values differentiated by the 12-power root of two from the sensor own frequency. The frequency by the abscissa axis grows from left to right by the logarithmic scale. The selectivity plot is almost symmetric with the maximum at the own frequency. The sensors are characterized by some inertia, displaying the rate of reaction on the appearance or disappearance of the sound in the own frequency domain. The selectivity and inertia of the sensor strictly depends one from another. The better is the sensor selectivity (the narrowest selectivity plot), the more this sensor is inert, i.e. slow. The optimum selectivity value is experimentally chosen being dependent from a particular converted melody, its, tempo, polyphony, percussion, average note duration, etc. The values generated at the sensors’ outputs are periodically scanned to detect the peaks. The peak is the sensor with the output value above the values of both adjacent sensors. The scanning is performed from left to right, i.e., from the low note sensor to the high note sensor. Before the start of each scanning cycle a uniform threshold value is set for all sensors. This threshold value depending upon the position of the “Gate” slider is presented by a horizontal dotted line in the “Spectrum” window. When a peak is detected, the addition is made to the threshold value of the sensors presenting the notes above the current note at 12, 19 and 24 semi-tones. The addition to the threshold depends upon the peak sound intensity and the histogram shape in the “Harmonic Model” window. The histogram can be interpreted as follows: the first column presents the peak sound intensity. The second column presents the addition to the threshold value for the sensor being at 12 semi-tones away from the peak, the third column presents the same for the sensor being at 19 semi-tones away from the peak, and the fourth column – for the sensor being at 24 semi-tones away from the peak. The harmonic model principle of the algorithm is based upon the presence of supplemental harmonics in each voice or instrumental note in addition to the main tone. Such supplemental harmonics have frequencies differentiated from the tone frequency 2-, 3-, 4-fold and more. The harmonic proportions depend upon the musical instrument or singer. The second harmonics is above the main first harmonics at 12 semi-tones exactly, the third harmonics is above the main first harmonics at 19 semi- tones with high precision, and the fourth is above the main first harmonics at 24 semi-tones exactly. The algorithm is limited by the first four harmonics, as the conversion quality is not greatly improved by further growth of the harmonics number, but the complexity is increased. If the peak sound intensity is above the threshold value, the appropriate note generation signal is enabled. Otherwise the note disabling signal is generated, if the note was enabled. In monophonic operation mode the scanning cycle is interrupted after the first enabled note. The previous note is disabled if different from the current note. Monophonic operation mode is set by the selection of the “Single Voice” flag in the “Method” group, and polyphone operation mode is set by the selection of the “Poliphony” flag. The note format signal generated at this stage is filtered by the note loftiness. The filter tuning is provided in the “Filter” window (located between the piano keyboard window and “Equalizer” window). The notes with the appropriate elements on in the “Filter” window only are subjected for further processing. The signal processed by note filtering is transposed, i.e., the note loftiness is shifted at the integer number of semi-tones, on the condition that the pre-set number of semi-tones is other from zero. The number of semi- tones for transposing is set by the “Transpose” control. At the positive control element value the notes are raised by transposing, and are lowered at the negative value. Then the signal is directed by three separate branches: to MIDI device selected from the “Midi Out” list, to PC speaker and to built-in sequencer. The signal is applied to the MIDI device if the “MIDI” flag is marked in the “Play/Keep silence” group, to PC Speaker - if the “PC Speaker” flag is marked, and to the built-in sequencer - in the record mode, i.e., when the “Record” button is pressed. The notes are filtered by duration when writing to the built-in sequencer. The notes shorter than the value in milliseconds set in the “Minimal Duration” window are ignored. The resulting MIDI signal given to the built-in sequencer and to MIDI device is generated with the consideration of the selected MIDI instrument, volume and MIDI channel. The instrument is set in the “Outlet MIDI Instrument” list, the volume is set by the “Volume” control, and the channel is set in the “MIDI Channel” list. 5.0 OPERATION The software must be tuned before conversion by setting the optimum values of all parameters during the test playing of the melody being converted. Make sure to set a flag for the device sound from which being converted in the Windows “Volume Control” program. Select the “Properties” of the “Options” menu and set the "Adjust volume for" switch to "Recording" to enable the necessary section. Set the record level of this device to normal position in this program. First, move the “Selectivity” slider to the middle position in the AudioToMidi program. This position provides the most comfortable selectivity to start the software tuning. Make the spectrum representation fit smoothly the “Spectrum” window by the “Sensitivity” control. The audio signal frequency characteristics should possibly be corrected by equalizer. Set the equalizer sliders’ positions by pressing the mouse left button or moving the mouse cursor with pressed button. The popup menu with save/open/reset equalizer commands is enable in the equalizer window by pressing the mouse right button. Make the explicit visual peaks in the “Spectrum” window symmetric by the “Tune” control. For example, if the guitar is tuned at 1/4 tone below the tuning fork, the slider should be moved at 1/4 control element scale to the left from the center. Proper guitar tuning is recommended, otherwise, the guitar and converted sound are not matching. When tuning the guitar, set the “Tune” slider in the middle position. The guitar can be comfortably tuned with open strings. First, make the first harmonic peak be placed at the appropriate note level by changing the string tension. Second, make the visual peak symmetric by more precise guitar tuning. Enable the “PC Speaker” or “MIDI” option for audible control, monophonic program mode is recommended. The next possible stage is the harmonic histogram setup in the “Harmonic Model” window. The histogram column height is set by clicking the mouse left button with the cursor at the necessary point or by moving the mouse cursor with pressed button. The popup menu with save/open/reset commands of the histogram settings is enabled in the window by pressing the mouse right button. The histogram columns should possibly match the recognized instrument harmonics. The proportion of harmonics can be viewed in the “Spectrum” window with a single instrument note sounding and without any other sounds. If the harmonic proportions cannot be defined set the histogram similar to the default one by its shape. Set experimentally the best note inclusion threshold by the “Gate” control. Vary the selectivity for the best conversion result. After the selectivity is changed, correct the general sensor sensitivity by the “Sensitivity” control. Set the passed note set in the “Filter” window to reduce the number of unwanted accidental notes. A note is either included to or excluded from the set by clicking the mouse left button with the cursor at the necessary point or by moving the cursor with the mouse button pressed. This can be done to all of the same name notes by double-click of the mouse left button in the note area. Another way to do that is to click the mouse right button at the selected note key in the piano keyboard window. The popup menu with save/open/reset commands of the filter settings is enabled in the “Filter” window by pressing the mouse right button. The cleaning command for the passed note set is also available in the menu. The conversion quality can be improved by narrowing the interval of passed notes by the filter tuning. As a rule, the percussion noise is concentrated in the low frequency domain. Consequently, if the musical piece has percussion instruments, the lower interval should be cut. In many cases the melody to be converted is limited by two or three octaves, and in rare cases it is limited by four or more octaves. Hence, the upper interval of passed notes can also be cut. The musical piece being converted may have a certain key. In this case the passed note set must have only the notes associated with the pre-set key with possible addition of raised or lowered key degrees. The software package includes the ready-made files of filter settings for all possible major and minor keys. The files for minor keys with the added raised seventh degree are also included. The file “Joe Dassin (+_Bm7).mid” obtained by the melody conversion in B minor key with the added raised seventh degree, i.e., A sharp, is enclosed as an example. The conversion result can be listened to in real time if the “MIDI” option of “Play/Keep silence” group is on. Monophonic melody result can be listened to by the built-in speaker, if the “PC Speaker” is on. Select the wanted MIDI instrument from the “Outlet MIDI Instrument” list for final MIDI signal. The wanted volume is set in the window “Volume”. The number of the shift semi-tones is set in the window “Transpose”. If several sound tracks each intended for a particular MIDI instrument are used for generation, a unique MIDI channel must be set for each track. Select a channel from the “MIDI Channel” list. Note: tenth channel is allocated for the percussion instruments and sound effects. Select the optimum value of the minimum note duration in milliseconds in the “Minimal Duration” window. Press the button “Record” to record the track in the built-in sequencer. Press the button “Pause” or start recording from playing station to listen to the tracks with a new track being imposed when recording. 6.0 LICENSE This software product is a freeware version. The author is not responsible neither for possible errors related to the operation of this software product nor for the consequences of such errors. The author rights to this software product are the property of Alexey Egorov. This software product cannot be sold, commercialized or distributed in a modified. This software product can be distributed together with the present documentation. 7.0 CONTACT Please find the information and files related to this product at http://www.midi.ru/AudioToMidi/. Please mail your suggestions and tell about the bugs to Alexey Egorov alegorov@mail.ru. Alexey Egorov alegorov@mail.ru http://www.midi.ru/AudioToMidi/ September 7, 1999